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We  present  a  solution  to  a  problem  of  considerable  practical  interest 
;jhich  arises  in  the  design  of  Management  Information  Systems.   After  pre- 
senting an  example  of  the  problem,  we  shall  formalize  it. 

Example:   Suppose  we  have  a  personnel  file  containing  complete  data  on 
all  our  20,000  workers.   The  Operations  Research  Group,  wishing  to  conduct 
a  study  of  absenteeism  among  older  workers,  has  created  a  much  smaller  file 
:ontaining  the  records  of  all  those  aged  60  or  over  as  well  as  those  of  any 
age  who  were  absent  10  or  more  days  in  the  last  year.   Let  us  suppose  that 
this  file  is  1500  records  long. 

The  personnel  department  now  wishes  to  obtain  the  names  of  all  employees 
rfithin  2  years  of  compulsory  retirement  at  age  65.   The  desired  list  can  be 
obtained  either  from  the  large  file  owned  by  the  personnel  department  or  from 
the  smaller  file  owned  by  the  Operations  Research  Group.   Question:   How 
•70uld  the  information  system  "know"  that  the  list  could  be  obtained  from 
the  shorter  file  (at  some  considerable  saving)? 

Formalization:   Let  a  "record"  be  a  collection  of  integers  k^ , . . . ,k. , . . . ,k 

[n  general,  the  value  of  k.  for  any  record  implies  nothing  whatsoever  about 

the  value  of  k.  for  another  record.   Let  a  "file"  be  a  collection  of  m  records. 
J 

Let  there  be  a  special  file  called  the  "master  file"  which  consists  of  all 
records  of  interest. 

Let  a  "request"  R.  be  a  function  which  maps  the  integer  contents  of  a 
record  into  the  values  "true"  and  "false".   Let  a  "strip  file"  S  be  a  file 
produced  as  follows:   The  master  file  is  read  record  by  record.   As  each 
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record  is  read,  the  request  R.  is  applied  to  the  integer  contents  of  that 
record.   If  and  only  if  the  value  of  R  is  "true",  the  record  is  copied  to 
the  strip  file  S  . 

Further,  let  the  data  base  at  any  moment  consist  of  the  master  file  and 

p  strip  files  S, S  ,...,S  ,  produced  from  requests  R  , . . . ,R. , . . , ,R  , 

respectively. 

Now,  a  new  request  R  ^  is  entered  into  the  system.   If  another  request, 

say  R  ,  can  be  found  such  that  R  ,  implies  R  ,  then  it  must  be  true  that 

all  of  the  records  in  the  master  file  which  would  cause  R  ,,  to  be  true  are 

p+1 

also  in  S  .   In  other  words,  the  records  in  S  , ,  will  be  a  subset  of  the 
a  p+1 

records  in  S  .   In  this  case,  there  is  probably  an  advantage  in  creating 

3. 

S  ,T  from  S  rather  than  from  the  master  file. 
p+1       a 

Procedure :   Determining  whether  some  request  R  implies  some  other 
request  R,  can  be  accomplished  by  a  technique  called  "recursive  residual 
reduction".    First  let  us  consider  requests  formed  in  terms  of  logical  vari- 
ables (i.e.,  those  taking  on  only  the  values  "true"  and  "false"  (abbreviated 
as  1  and  0  respectively) ;  three  Boolean  operators  *  ("and") ,  +  ("inclusive  or") , 
and  '  ("not");  and  parentheses;  all  with  their  customary  meanings.   (We  assume 
that  *  takes  precedence  over  +  in  the  hierarchy  of  operators.) 

We  will  use  the  capital  letters  A-H  as  logical  variables.   Consider  two 
requests: 


R,   =  A+B 


R^  =  A*(C+B)' 


-  2  - 


R  is  true  if  either  A  or  B  (or  both)  is  true,  and  R„  is  true  if  A  is  true, 
but  neither  B  nor  C  is  true.   R-  implies  R, ,  since  R^  is  always  true  if  R^ 

is  true.   This  fact  can  be  determined  by  a  simple  procedure,  based  on  the 

2 
"Expansion  Theorem"  ,  which  states  that  f(x- x . , . . . ,x  )  =  x  .*f (x^  , . . . ,1 , . . 

X  '*f (x  , . . . ,0, . . . ,x  )  where  f  is  any  Boolean  expression  in  logical  variables 

x. l,...,x  .   The  expressions  f (x. , . . . ,1, . . . ,x  )  and  f(x.  ,...,0 x  ) 

are  called  the  "residue  about  x  "  and  the  "residue  about  x'"  respectively. 

Implication  can  be  determined  by  the  following  recursive  procedure: 

Let  U  and  V  be  two  Boolean  expressions. 

1.  U  does  not  imply  V  if  U  =  1  and  V  =  0. 

2.  U  implies  VisU=O^V=lor  both. 

3.  If  neither  rule  1  nor  rule  2  applies,  choose  a  logical  variable,  say 
Q,  appearing  in  either  U,  V,  or  both.   (Call  this  variable  the  "pivot  vari- 
able".)  Take  residues  of  U  and  V  about  Q  and  Q'.   U  implies  V  if  and  only  if 
the  residue  of  U  about  Q  implies  the  residue  of  V  about  Q  and  the  residue 

of  U  about  Q'  implies  the  residue  of  V  about  Q'. 

Reducing  a  residue  to  simpler  form  is  achieved  by  applying  the  following 
rules:  Let  E  be  any  logical  expression  (i.e.,  a  logical  variable,  a  constant 
0  or  1,  or  an  expression  enclosed  in  parentheses. 

a)  1  +  E  =  E  +  1  =  1 

b)  1*E  =  E*1  =  E 

c)  0  +  E  =  E  =  0  =  E 

d)  0*E  =  E*0  =  0 

e)  0*  =  1 

f)  1'  =  0 
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g)   parentheses  should  be  removed  where  not  needed  for  operations  hier- 
archy, i.e.,  from  any  single  logical  variable  or  constant;  or  from  a  complete 
expression  where  neither  preceded  nor  followed  by  an  operator. 

Let  us  go  back  to  R,  =  A  +  B 

R^  =  A*(C+B) ' 

and  apply  our  rules  to  see  whether  R„  implies  R, . 

First,  we  see  that  neither  rule  1  nor  rule  2  applies  directly,  so  in- 
voking rule  3  and  choosing  A  as  pivot  variable,  we  have 

Residue  about  A  Residue  about  A' 

R^  1  +  B  =  1  0  +  B  =  B 

Rj         1  *CC+B)'  =  (C+B)'  0  *CC+B)'  =  0 


By  rule  2,  it  will  be  seen  that  the  residue  of  R-  about  A  implies  the  residue 
of  R^  about  A  (because  the  residue  of  R^  about  A  is  1) ;  and  the  residue  of  R 
about  A'  implies  the  residue  of  R^  about  A'  (because  the  residue  of  R„  about 
A'  is  0) .   Therefore  R^  implies  R, . 

In  practice  we  would  probably  not  represent  the  logical  expressions  sym- 
bolically, but  rather  as  a  "Polish"  string  of  operators  and  operands  with  all 

3 
negations  resolved  by  De  Morgan's  theorem. 

Now  let  us  return  to  our  original  problem  of  two  personnel  files.   We 

must  extend  our  language  to  include  the  relational  operators  =  ,  ?',>,>,  <, 

^,   .   Suppose  that  integers  k  and  k.  represent  age  as  of  January  1  and  days 

absent  last  year  respectively.   The  file  owned  by  the  Operations  Research 

Group  was  produced  by  the  request  R^  =  (k.Z-60)  +  (k  2llO).   The  file  desired 
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by  the  personnel  department  could  be  described  by  R,  =  (k  Z  63) .  Our  first 
step  is  to  create  logical  variables  corresponding  to  each  relational  opera- 
tion.  Let: 

A  =   (k.i:60) 

B  =   (k.^lO) 

C  =   (k.2  63). 
That  we  are  permitted  to  do  this  is  clear,  since  a  relational  operation  is, 
for  any  record,  either  true  or  false.   Our  requests  are  now 

R-   =  A  +  B 

R,   =  C 
4 

But  we  know  that  both  A  and  C  are  derived  from  the  same  integer,  namely,  k.. 
In  fact,  if  C  is  true,  A  must  also  be  true.   Therefore,  C  implies  A.   Recog- 
nizing this  fact  makes  us  treat  the  Expansion  Theorem  more  generally:   Any 
residue  about  C  will  substitute  1  for  both  A  and  C.   Any  residue  about  A' 
will  substitute  0  for  both  A  and  C. 

The  reasoning  behind  this  is  clear.   The  residue  of  U  about  X  is  simply 
the  form  of  U,  remaining  if  X  is  assumed  true.   But  if  X  implies  Y,  then  the 
residue  of  U  about  X  must  also  assume  Y  to  be  true.   Similarly,  the  residue 
of  U  about  Y'  must  assume  X  to  be  false. 

Let  us  now  try  the  generalized  procedure  on  R-  and  R , ,  remembering  that 
C  implies  A. 

R3     A  +  B 

R,      C 

4 

Neither  rule  1  nor  rule  2  applies  so  applying  rule  3  and  choosing  B  as 
pivot  variable,  we  have 
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Residue  about  B  Residue  about  B' 

R^  A+l  =  l  A+0  =  A 

R4         C  C 

The  residue  of  R,  about  B  implies  the  residue  of  R-  about  B  by  rule  2,  but 
neither  rule  1  nor  rule  2  applies  to  the  residues  about  B',  so  we  invoke  rule 
3,  choosing  C  as  pivot  variable: 

Residue  about  C  Residue  about  C' 

Residue  of  R  about  B'     A  =  1  (by  C  implies  A)    A 
Residue  of  R.  about  B'     1  0 

Both  residues  of  R,  imply  their  corresponding  residues  of  R_,  therefore 
R,  implies  R-. 

Extensions :   The  general  procedure  descirbed  herein  is  capable  of  a 
great  many  extensions: 

1.  File  structure  —  Our  definition  of  a  master  file  is  phrased  so 
that  it  includes  files  stored  on  direct-access  devices  as  well  as  sequential 
devices.   The  notion  of  a  strip  file  is  easily  generalized  then  to  a  list 

of  those  records  stored  in  a  direct-access  master  file  which  meet  some  cri- 
terion (request) . 

2.  Other  Relationships  —  Equivalence,  negation,  contraimplication,  etc., 

4 
can  be  determined  by  simple  modifications  of  the  procedure  described  here. 

3.  Other  Operations  —  All  logical  functions  of  two  variables  (there 

are  at  most  16)  can  be  easily  handled  by  appropriate  additions  to  rules  a)  -  g) ^ 
Moreover,  one  practical  situation  which  we  have  recently  faced  required  the 
definition  of  n-ary  equivalents  of  "or"  and  "and".    Extension  of  this  tech- 
nique to  n-ary  operations  is  straight  forward. 
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4.   Finding  the  shortest  qualifying  strip  file  —  If  we  order  strip 
files  S^...S   in  ascending  order  of  length,  then  iteratively  apply  this 
technique  to  each  R.  in  turn  with  respect  to  R  .  ■.  .  the  first  R  implied  by 
R    is  the  shortest  qualifying  file.   If  none  qualify,  the  master  file 
must  be  used. 


Conclusion 

The  "Expansion  Theorem"  is  a  powerful  tool  in  the  analysis  of  the  con- 
tents of  data  files. 
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